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Detecting Overlapping Link Commnnities by Finding 
Local Minima of a Cost Fnnction with a Memetic Algorithm 

Part 1: Problem and Method 
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Abstract 

We propose an algorithm for detecting commu¬ 
nities of links in networks which uses local infor¬ 
mation, is based on a new evaluation function, 
and allows for pervasive overlaps of communi¬ 
ties. The complexity of the clustering task re¬ 
quires the application of a memetic algorithm 
that combines probabilistic evolutionary strate¬ 
gies with deterministic local searches. In Part 2 
we will present results of experiments with cita¬ 
tion networks. 

1 Introduction 

Communities in networks are commonly defined 
as cohesive subgraphs which are well separated 
from the rest of the network. This vague concept 
of communities is operationalised in a variety 
of ways (Fortunato 2010). The utility of algo¬ 
rithms for the detection of communities in net¬ 
works partly depends on their ‘conceptual fit’, 
i.e. on the degree to which they match proper¬ 
ties of the phenomenon that is represented (Hric, 
Darst, and Fortunato 2014). Achieving such a 
conceptual fit may require unusual combinations 
of ideas from network analysis, as is the case 
with the question and the algorithm presented 
in this paper. 

Consider the following three properties of a 
network and the task of community detection. 
First, links between nodes contain better infor¬ 
mation about communities than the nodes that 
are to be clustered. In this case, link clustering 
appears to be the method of choice. Construct¬ 
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ing communities by clustering links has been 
proposed by Evans and Lambiotte (2009) and by 
Ahn, Bagrow, and Lehmann (2010) as a method 
for the construction of overlapping communities 
of nodes. In addition, clustering links is likely 
to be advantageous whenever the information 
asymmetry described above occurs, i.e. when¬ 
ever links rather than nodes have the real-world 
properties whose similarity shall be reflected by 
clusters. 

Second, overlapping communities must be a 
possible outcome of the algorithm because the 
real-world phenomenon under investigation is 
known to have such a structure. For the same 
reason, pervasive overlaps must be possible, i.e. 
overlaps that extend to all nodes rather than just 
the boundary nodes of a community. The con¬ 
struction of overlapping communities is by now 
a well-known and frequently addressed problem 
of network analysis (Fortunato 2010; Xie, Kelley, 
and Szymanski 2013; Amelio and Pizzuti 2014). 

Third, the phenomena to be represented by 
communities are local in that they emerge from 
local interactions represented by neighbouring 
nodes and links in the network. If this is the 
case, the use of local rather than global in¬ 
formation may return better communities and 
a better community structure of the network 
(Clauset 2005; Lancichinetti, Fortunato, and 
Kertesz 2009; Havemann, Heinz, Struck, and 
Glaser 2011). 

All three ideas have been developed in net¬ 
work analysis. However, as reviews of algo¬ 
rithms indicate (Fortunato 2010; Xie et al. 2013; 
Amelio and Pizzuti 2014), link clustering, per¬ 
vasively overlapping communities and use of lo¬ 
cal information have not yet been combined all 
three, possibly because the task for which this is 
necessary has not yet arisen.^ 

^The only apparent exception is the work by Lei Pan 
et al. which, however, compromises in two expects, global 
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There is at least one task for which this com¬ 
bination of link-based approach, pervasive over¬ 
laps and local approach is necessary, namely the 
detection of thematic structures (topics) in net¬ 
works of papers. 

In networks of papers and their cited sources, 
citation links (links between a publication and 
the sources it cites) are thematically more ho¬ 
mogenous than nodes (papers), and thus pro¬ 
vide better information for clustering, than the 
papers themselves. While papers commonly be¬ 
long to more than one scientific topic, many ci¬ 
tation links can be assumed to be homogenous 
in that the link between paper and source be¬ 
longs to only one topic. If it belongs to more 
than one topic these topics often are not very 
distant from each other. 

Scientific topics are known to overlap perva¬ 
sively, which means that their reconstruction 
as communities of papers must reflect this per¬ 
vasive overlap. Topics are also locally emer¬ 
gent phenomena in that they represent coincid¬ 
ing and mutually referring perspectives of re¬ 
searchers (the authors of the papers). 

In order to reconstruct scientific topics from 
networks of papers and their cited sources, then, 
we need an algorithm that clusters links, can 
construct pervasively overlapping communities, 
and uses mainly local information. In this paper, 
we present such an algorithm (in Part 1) and its 
application to citation networks (in Part 2). We 
propose a local cost function for the indepen¬ 
dent evaluation of each link community by re¬ 
lating its external to its total connectivity in the 
network. The cost function is almost completely 
based on local information, the only global infor¬ 
mation used is the number of links in the whole 
network. The independent evaluation of each 
subgraph with a local cost function means that 
communities can be constructed independently 
from each other, which enables pervasive over¬ 
laps. 

The cost function we propose for subgraph 
evaluation is solely based on the network’s topol¬ 
ogy and not on link similarity. Generally, clus¬ 
tering by optimising a (global or local) evalua¬ 
tion function needs no measure of similarity of 
clustered elements but results in clusters the ele- 


information used in the end and no pervasive overlap be¬ 
cause link clusters are disjunct (Pan, Wang, Xie, and 
Liu 2011; Pan, Wang, and Xie 2012). Furthermore, they 
differ from our approach because they propose an evalu¬ 
ation function for link clustering which is derived within 
the node clustering approach. 


ments of which are seen as similar in some sense. 
In contrast, the approach to link clustering pro¬ 
posed by Ahn et al. (2010) is based on link sim¬ 
ilarity. The authors estimate the similarity of 
two links by comparing their sets of neighbour¬ 
ing nodes. This is not very appropiate for cita¬ 
tion links because we would estimate thematic 
similarity of thematically nearly homogenous el¬ 
ements (citation links) with sets of very inho- 
mogenous elements (papers, cited sources). In 
the case of citation networks, it would be bet¬ 
ter to measure link similarity by using textual 
information from citing and cited documents. 

The local construction of topics, their vary¬ 
ing size and pervasive overlaps make it likely 
that topics form a poly-hierarchy i.e. a hierar¬ 
chy where a smaller topic can be a subtopic of 
two or more larger topics that have no hierar¬ 
chical subtopic relation. This poly-hierarchy of 
topics should be reflected in a poly-hierarchy of 
communities. 

Communities without sub-communities can 
be well separated and very cohesive, too, but 
inside larger communities there can exist well 
separated sub-communities which diminish the 
cohesion of their super-community. 

Since the cost landscape of link communi¬ 
ties has many local minima, purely determin¬ 
istic search strategies are not efhcient. This is 
why we designed a memetic search that com¬ 
bines an evolutionary algorithm with determin¬ 
istic adjustments in the cost landscape. Evolu¬ 
tionary algorithms have already been used for 
identifying communities in networks (Fortunato 
2010, p. 106). Some authors have even applied 
evolutionary algorithms to link clustering but all 
used global evaluation functions (Pizzuti 2009; 
Li, Zhang, Wang, Liu, and Zhang 2013; Shi, Cai, 
Fu, Dong, and Wu 2013). Memetic evolutionary 
algorithms have also been applied to reconstruct 
communities but only for node clustering and 
only with global evaluation functions (Gong, Fu, 
Jiao, and Du 2011; Pizzuti 2012; Gach and Hao 
2012; Ma, Gong, Liu, Cai, and Jiao 2014). 

2 Strategy 

The strategy we apply in response to the three 
challenges described in the introduction consists 
of three main steps. We develop an evaluation 
function for link communities that uses local in¬ 
formation. This evaluation function makes it 
possible to construct each community indepen- 
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dently from all others, which in turn enables per¬ 
vasive overlaps because inner links (links all of 
whose neighbours are community members) of 
one community can also be inner links of an¬ 
other community. We then design an algorithm 
that constructs local communities. 

For the first step, we followed a suggestion 
by Evans and Lambiotte (2009) to obtain link 
clusters by clustering vertices in a network’s line 
graph. We defined a local cost function d>(L) 
in the line-graph approach which we call ratio 
node-cut. It can be used to identify link commu¬ 
nities by finding local minima in the cost land¬ 
scape. Since 'k(F) evaluates the boundary be¬ 
tween a subgraph and the rest of the network, 
communities can be constructed independently 
of all other communities. 

The cost landscape of dt (L) is often very rough 
i.e. has many local minima that may correspond 
to very similar subgraphs. Therefore, the reso¬ 
lution of the algorithm must be defined by set¬ 
ting a minimum distance (number of links that 
differ) between subgraphs corresponding to dif¬ 
ferent local minima. We define the range of a 
community as a distance in which no subgraph 
exists that has a lower 'k-value. 

Since the task of finding communities in large 
networks is always very complex, heuristics must 
be applied. This applies even more strongly to 
link clustering because networks contain many 
more links than nodes, and particularly to the 
rough dt-landscape. We chose an evolutionary 
algorithm but accelerate evolution by combin¬ 
ing it with a deterministic local search in the 
cost landscape. This approach is called memetic 
(Neri, Cotta, and Moscato 2012). Memetic algo¬ 
rithms can also find local optima of a local cost 
function (Vitela and Castahos 2012). 

In evolutionary algorithms, individuals oc¬ 
cupy places in the cost (or fitness) landscape. 
In our local algorithm, populations are sets of 
different subgraphs. We start with a random 
initialisation of the population of some definite 
size. The genetic operators of crossover, mu¬ 
tation, and selection are repeatedly applied to 
move the population into optima. In memetic 
algorithms each crossover and each mutation is 
followed by a local search. 

In large networks exploring the cost landscape 
by adding or removing individual links is very 
time-consuming. We therefore begin the search 
with a coarse search phase that adds or removes 
groups of links by adding or removing nodes 


with all their links, and follow it with fine search 
phase, namely link-wise memetic evolution or at 
least a link-wise local search. 

3 The cost function: 
ratio node-cut 

3.1 Node-induced and 
link-induced subgraphs 

Traditionally, the boundary of a community is 
drawn between nodes and therefore cuts the 
links between nodes inside and outside the com¬ 
munity. If we consider communities as clusters 
of links rather than nodes, the perspective must 
be reversed. While the boundary of a node com¬ 
munity cuts links, the boundary of a link com¬ 
munity cuts nodes. 

A node community is a connected subgraph 
defined by a node set C. It contains all links 
existing between nodes in C. A link community 
is a connected link-induced subgraph. It con¬ 
tains all nodes attached to links of a given set 
L. There can be links existing between a link 
community’s nodes which are not in L. 

Cost functions of a subgraph can be defined 
by relating a measure of external to a measure of 
total connectivity. This ratio should be minimal 
for well separated and cohesive subgraphs i.e. for 
communities. 

Node communities can be defined as con¬ 
nected subgraphs corresponding to minima in 
cost landscapes where places correspond to 
node-induced subgraphs. Correspondingly, link 
communities can be defined as connected sub¬ 
graphs corresponding to minima in cost land¬ 
scapes where places correspond to link-induced 
subgraphs. 

In the following, we only consider connected 
unweighted graphs G = {V,E). The number of 
edges (or links) is m = \E\, the number of ver¬ 
tices (or nodes) is n = \V\. With ki we denote 
the degree of node i. The internal degree of node 
i, denoted by k™{L), is the number of links at¬ 
tached to node i which are in link set L. The 
external degree of node i is A:°“‘(L) = ki—k™{L). 

3.2 External connectivity 

We first consider measures of external connec¬ 
tivity of a subgraph which are useful for con¬ 
structing node or link communities. The sim¬ 
plest measure of external connectivity of a node- 
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induced subgraph is the cut size that equals the 
sum of weights of boundary links i.e. the links 
connecting the subgraph with the rest of the 
graph (Fortunate 2010, p. 92). If link weights 
represent electrical conductance, cut size mea¬ 
sures the total conductance of all boundary 
links. Cut size can be calculated as the sum 
of external degrees of boundary nodes 

(subgraph members with boundary links). 

Applying these considerations to the external 
connectivity of a link-induced subgraph leads to 
a simple measure of external connectivity as the 
sum of of boundary nodes: 


<^iL) = Y, 


krHL)k'^{L) 

ki 


( 1 ) 


Only for boundary nodes of L we have > 

0. That means, we can restrict the sum in the 
formula to boundary nodes. In function a{L) 
the external degrees fc™* are weighted with sub¬ 
graph membership-grade kf^/ki of the boundary 
nodes. The function a{L) can be derived from 
the total conductance or cut size of link sets in 
the graph’s line graph if the line graph’s edges 
are weighted with \/ki —a weighting proposed 
by Evans and Lambiotte (2009). The derivation 
can be found in Appendix A. 

Each term of cr(L) equals the conductance of 
a boundary node i i.e. the total conductance 
for currents flowing out of the subgraph through 
this node. We call <j{L) the node cut of a link- 
induced subgraph. 


3.3 Internal and total connectivity 

Now we discuss measures of internal and total 
connectivity of subgraphs induced by node and 
by link sets, respectively. In the case of node- 
induced subgraphs fcin(C') = X)iGC^r(C') is an 
appropriate measure of internal connectivity of 
node set C. Total connectivity of C is then the 
sum of degrees of all nodes in C: 

fctotal(C) = E + fcr = E (2) 

zGC iGC 

For a link-induced subgraph we can use the sum 
of internal degrees, weighted with their member¬ 
ship, as a measure of internal connectivity: 

r(L)=E^E^pE). (3) 


The sum is restricted to nodes attached to links 
in L because other nodes have k™{L) = 0. To¬ 
tal connectivity of L is then given by the sum 
a(L) +r(L) = T.UkT[L) = h^{L) = 2\L\. 
The derivation can be found in Appendix A. 


3.4 Cost function 


Relating external to total connectivity leads us 
to cost functions whose minima correspond to 
well separated and cohesive subgraphs. On the 
other hand, we also achieve a size normalisation 
when we divide external by total connectivity. 
This is welcome, because the boundary length 
(measured by external connectivity) tends to in¬ 
crease with size (here measured by total con¬ 
nectivity fci„(L) = 2|L|) —at least for not too 
large subgraphs in not too small networks. If 
a subgraph occupies more than one half of the 
network its boundary tends to become shorter 
with increasing size. A simple size normalisa¬ 
tion that accounts for the finite size of the net¬ 
work is achieved by adding to the external-total 
ratio of a subgraph the same ratio of its comple¬ 
ment. For small subgraphs in a large network 
the second ratio is very small. For node-induced 
subgraphs this normalisation was introduced by 
Wei and Cheng (1989) and named ratio cut. For 
link-induced subgraphs we analogously define a 
cost function ratio node-cut as 




(J{E\L) 

k^L) kUE\L) ^ ^ 

(5) 

kin(L)(l-kn(L)/2m)- 


The expression on the r.h.s. is obtained because 
a(E\L) = a{L) and fcin(E\L) = 2m — kin{L). 
Ratio node-cut 4 is not strictly local but the 
only global information needed here is the to¬ 
tal number of links m. In the limit of small 
subgraphs in large networks we achieve approxi¬ 
mately strict locality because we have fcin(T) ^ 
2m and we therefore obtain 


vI-(L) 


kULY 


( 6 ) 


which equals a strictly local cost function for 
the construction of link communities intro¬ 
duced by us earlier (Havemann et al. 2012). 
Our cost function ^'(L) rewards separation of 
link community L but not really its cohesion. 
Yang and Leskovec (2012) found that evaluat¬ 
ing node subgraphs with conductance —a mea¬ 
sure analogue to a{L)/kin{L) in the world of 
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node communities—can also lead to communi¬ 
ties with low cohesion. 

That means, using cost function we empha¬ 
size separation and require only a minimal cohe¬ 
sion of subgraphs, which is expressed by the de¬ 
mand that subgraphs must be connected. Oth¬ 
erwise an unconnected subgraph with parts in 
very different regions of the network could be a 
community. 

Function a(L) vanishes for the empty sub¬ 
graph with L = 0 and for the full graph with 
L = E. In both cases, the denominator of the 
cost function also vanishes and we obtain zero 
divided by zero but it makes sense to define 
'l'(if) = 4'(0) = 1 because of one link (1, 
2) with vanishing weight Wi 2 approximates 1: 


vl/((l,2))=n;i2^ 


- Wi2)/ki + {k2 - Wi2)/k2 
2wi 2(1 — 2wi2l2m) 

(7) 


Our cost function is symmetric: d>(L) = 

'i>{E\L), i.e. the cost function is the same for 
a link-induced subgraph and the subgraph in¬ 
duced by the complementary link set E\L. 


3.5 The cost landscape 

Each place in the cost landscape represents a 
link-induced subgraph. Two places in the land¬ 
scape have a direct relation if and only if the 
corresponding subgraphs differ in one link. The 
height of each place is given by the value of the 
cost function 4'(T). The global minimum of the 
cost function is reached for a division of the set 
E of all links that produces the two best link 
communities in terms of separation. As a simple 
example, we determined the 'k-landscape of the 
bow-tie graph (Figure 1, for calculations see Ap¬ 
pendix B). We expect a cut through the central 
node to be the best division in two link commu¬ 
nities (the two triangles). Indeed, the landscape 
has two minima with dt = 1/3, which correspond 
to the two triangles. There are no further local 
minima. 

We do not restrict the search for link commu¬ 
nities to finding only the global minimum but 
define a link community as a connected link- 
induced subgraph which corresponds to any lo¬ 
cal minimum in the d^-landscape. Since the 4'- 
landscape of larger graphs contains many local 
minima, we need a filter to select the locally best 
link communities. For this reason, we restrict 
our search to those minima with a sufficiently 
large distance to any lower place in the cost land¬ 


scape. Thus, we have to define the resolution of 
the search by defining this minimal distance in 
the landscape. The appropriate resolution de¬ 
pends on the research question about the phe¬ 
nomenon represented by the network. The ex¬ 
tent to which two communities should differ in 
content (of links) to consider them as different 
depends on the question asked about communi¬ 
ties. 

Another place in the cost landscape is reached 
by adding links to and by removing links from 
the subgraph corresponding to the starting 
place. The distance between two places in the 
cost landscape equals the sum of the number of 
links we have to add and to exclude. In other 
words: the distance is the size of the symmetric 
difference between the two link sets. We define 
the range of a community as the minimal dis¬ 
tance to a subgraph with lower cost. Within a 
community’s range there is no better subgraph. 
The resolution of a search for communities can 
be defined as the minimal range of communities 
that are accepted as valid solutions. Depending 
on the networks real background, a relative reso¬ 
lution can be more appropriate. That means, we 
demand that any valid community should have 
a range which is larger than a certain percentage 
of its size. 

In order to determine the range of a com¬ 
munity we would need to know its whole en¬ 
vironment up to the distance to the nearest 
lower place in the cost landscape. Otherwise, a 
lower place only determines an upper bound of 
the community’s range. However, searching the 
whole environment of a subgraph is practically 
impossible for large networks. A selective search 
is necessary, which is why we apply evolutionary 
and deterministic greedy algorithms. If these al¬ 
gorithms find an upper bound smaller than the 
set resolution, we can deselect the community. If 
they don’t, the community is provisionally kept 



Figure 1: Bow-tie graph 
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but can later be replaced by a better community 
within its minimal range defined by the set res¬ 
olution. We assume, however, that later found 
better solutions differ only in some links. 

4 Memetic search 

Memetic algorithms combine random evolution 
with deterministic local search. In this section, 
we describe 

1. the local search we apply, called adaptation 
for short, 

2. our implementation of the evolutionary ap¬ 
proach, 

3. the genetic operators of mutation, cross¬ 
over, and selection we employ in the evo¬ 
lutionary approach. 

The memetic algorithm is applied in the 
search for link communities, which can be done 
by exploring the cost landscape of a network 
by adding or removing individual links. For 
large subgraphs this is very time-consuming. We 
therefore split the search in a coarse phase, in 
which we add or remove nodes with all their links 
to other nodes in the subgraph, and a finer link- 
wise search, which is applied after communities 
have been identified by a node-wise search. Af¬ 
ter communities with a minimal range defined 
by the set resolution are found in a node-wise 
memetic search, they are subjected to a link- 
wise memetic search or at least a link-wise local 
search. 

4.1 Local search 

The local search in the cost landscape applies a 
greedy algorithm for finding local cost minima 
that correspond to communities. The algorithm 
starts from the place occupied by the current 
subgraph and moves to subgraphs with lower 
I'-values. The algorithm is greedy because it 
always chooses the step that brings the biggest 
decrease or the smallest increase of I*. A step 
includes or excludes a node with all their links 
to the nodes already in a subgraph in node-wise 
local search, and includes or excludes an indi¬ 
vidual link in link-wise local search. 

A valid community can be made invalid and 
replaced by a better one if the better one is 
within its minimal range which is set by the res¬ 
olution parameter. Therefore, the local search 


has not to find subgraphs with lower cost in each 
step but can go a number of steps by ‘tunneling’ 
through ‘barriers’ in the landscape (areas with 
higher 'k) before reaching lower values which in¬ 
validate the community at the tunnel’s entrance. 
Tunneling makes the algorithm more efficient. 
The maximum length of a tunnel through a bar¬ 
rier of higher ^'-values is determined by the set 
resolution. 

The local search can begin by a series of either 
inclusions or exclusions of nodes (links). When 
no further improvement can be achieved, the 
search switches from inclusion to exclusion or 
vice versa. Inclusion and exclusion are contin¬ 
ued until no further improvement is possible. 

If the exclusion of nodes fragments a sub¬ 
graph, we proceed with the subgraph’s main 
component. In the link-wise local search the 
greedy algorithm is allowed to go through in¬ 
termediary states representing unconnected sub¬ 
graphs. At the end of the link-wise local search 
we determine all components of the subgraph. If 
the subgraph is unconnected we repeat the pro¬ 
cedure for each component until we obtain only 
connected subgraphs with minimal cost. 

A greedy algorithm is efficient because the 
cost reduction for all possible cases of includ¬ 
ing a neighbour must be calculated only at the 
beginning of the local search. In the subsequent 
steps, we only calculate or recalculate cost re¬ 
ductions achieved by adding neighbours of the 
link (or node) included. Analogously, we pro¬ 
ceed when excluding boundary nodes or links 
(Havemann et al. 2012, Appendix). Otherwise 
it would be more efficient to include or exclude 
just the first node (link) which reduces cost. 


4.2 Evolution 

The general implementation of the memetic al¬ 
gorithm is described by Algorithm 1.^ The ge¬ 
netic operators of crossover, mutation, and selec¬ 
tion (described below) are applied to each gener¬ 
ation of communities. Subgraphs generated by 
crossover and mutation are adapted by a local 
search. If the starting subgraph is not connected 
we replace it by its main component. Evolution 
is terminated when no better best community is 
found for many generations. 

^The notation is inspired by a pseudocode given by 
Merz (2012). 
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4.3 Genetic operators 

Mutation: We mutate a community with mu¬ 
tation variance u < 1 by changing maximally a 
proportion v of its links or nodes. In node-wise 
memetic evolutions we randomly exclude bound¬ 
ary nodes and then include the same number of 
neighbouring nodes. In link-wise memetics we 
experiment with two other mutation operators: 
we only exclude or include links and concentrate 
changes around one randomly chosen boundary 
node. (Details can be found in Appendix of 
Part 2.) 

Crossover: From two parent subgraphs we 
construct two new individuals by taking inter¬ 
section and union of the subgraphs as starting 
points for adaptive local searches. Of course, it 
has no effect to cross such parents where one of 


Algorithm 1 Pseudocode of memetic evolution 
for one adapted seed 

initialise population P by mutating the 
adapted seed with high variance several times 
and adapting mutants 

while the best community is not too old do 
mutate the best community with low vari¬ 
ance and adapt the mutants 
if an adapted mutant is new and its cost is 
lower than highest cost then 
add it to population P 
end if 

cross the best community with some ran¬ 
domly chosen communities and adapt the 
offspring 

if adapted offspring is new and its cost is 
lower than highest cost then 
add it to population P 

end if 

select the best communities so that the 
population size remains constant 
if there is no better best community for 
some generations and innovation rate is low 

then 

renew the population by mutating the 
best community with high variance and 
adapt mutants 

select the best communities so that the 
population size remains constant 

end if 
end while 


them is part of the other one. Normally, evolu¬ 
tionary algorithms include some randomness in 
the crossover, which in our case would mean to 
enlarge the intersection by some nodes or links 
from the union. In contrast, our crossing pro¬ 
cedure is deterministic because the boundary of 
the union of two good communities should also 
be not too bad. The same holds for the intersec¬ 
tion. Deterministic crossover should be (and is) 
done only once with the same parents. The only 
random element of our crossover is the random 
selection of parents. 

Selection: From the old population and the 
results of mutations and crossovers we select the 
communities with lowest ^'-values, keeping the 
population size constant. A new best commu¬ 
nity is only included if it is inside the minimal 
range of the best community of the original pop¬ 
ulation. Disregarding the best communities out¬ 
side the minimal range assures that we do not 
lose communities which can have a range above 
the minimum given by the resolution limit we 
apply. Deselected communities can be used as 
seeds for other memetic searches. 

Renewal: Renewal means to mutate the best 
community with high variance several times, to 
adapt the mutants, and to apply a usual selec¬ 
tion procedure described above. 

5 Concluding remarks 

In the forthcoming Part 2 of our paper we dis¬ 
cuss test results. 
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Appendix 


auxiliary bipartite graph. This becomes clear if 
we factorise the terms of the sum in equation 8: 


A Connectivity measures 

In this section we derive the connectivity mea¬ 
sures for link sets a{L) and km{L) from analogue 
measures in the line graph. We closely follow 
the arguments given in our earlier paper (Have- 
mann, Glaser, Heinz, and Struck 2012). 

We here use i,j = 1,... ,n to denote nodes 
and k,l = l,...,m for links. With C{L) we 
name the set of nodes attached to links in the 
subgraph induced by link set L. If a link k be¬ 
longs to L its membership fik{L) = 1 and zero 
otherwise. 

To construct a network’s line graph we first 
define an auxiliary bipartite graph obtained by 
putting a node on each link of the original net¬ 
work. The affiliation matrix B of the bipartite 
graph—also called its incidence matrix—has a 
row for each of the n original nodes and a col¬ 
umn for each of the m original links. Each link 
column contains only two non-zero elements, 
namely the elements in the rows of the nodes 
i and j connected by the link. We can project 
the bipartite graph back onto the original net¬ 
work with the product BB^ which equals its 
adjacency matrix A (except for the main diago¬ 
nal). 

We obtain the network’s line graph by the op¬ 
posite projection B^B of the bipartite graph. 
Evans and Lambiotte (2009) underline, that in 
all cases of practical intererest the line graph 
contains the same amount of information as the 
original network. Knowing B^B we can almost 
ever calculate BB^ and thus also the network’s 
adjacency matrix A. 

Because each node of the original network is 
represented as a clique in the line graph Evans 
and Lambiotte (2009) weighted the edges of the 
line graph with the inverse degree \/ki of the 
node i in the original network. They define the 
line graph’s adjacency matrix as 

( 8 ) 

i—l 

Weighting the line graph’s edges with the in¬ 
verse degrees of nodes in the original network is 
equivalent to an Euclidean normalisation of the 
nodes’ vectors in the affiliation matrix B of the 


Eki = 

2=1 


Eik Eil 

Vh Vh 


(9) 


Then we can shortly write E = with Dn; = 
Bik/\fki and verify the Euclidean normalisation 
of the n row vectors of D (for unweighted net¬ 
works for which we have B^^. = Bik)'. 


m m -I 

= = (10) 


k=l 


k=l 


k=l 


On the other hand, the projection of the nor¬ 
malised bipartite graph described by affiliation 
matrix D back on a network of the original nodes 
is given by . An element of adjacency ma¬ 
trix is given by 


DikDjk — 


k=l 


k=l 


BikBjk 

\J ki kj 


An 


s/kikj 


( 11 ) 


Thus, Euclidean normalisation of B's row vec¬ 
tors is equivalent to weighting each link in the 
original (unweighted) network with the geomet¬ 
ric mean of its nodes’ inverse degrees. The 
weighted graph described by adjacency matrix 
E is not the line graph of the unweighted net¬ 
work described by adjacency matrix A but of the 
network weighted according to equation 11. It 
depends on the real relations we model with the 
network whether this is a realistic weighting. 

Now we calculate internal connectivity t{L) 
as the sum of internal degrees of vertices in the 
line graph: 


rik) = ^ fik{L)Ekifii(L) 


k,l=l 


™ ” R. R. 

Ki 


( 12 ) 


k,l=l 


In the same way, we can calculate external con¬ 
nectivity <j{L) as the sum of external degrees in 
the line graph: 


cr{L) = Y^ HkiL)Ekiil - miL)). 

k,l=l 

m U U 

= Ykik{L)Y^^{^-k^iiL)). 


k,l=l 


2=1 


(13) 










Now we use the relations 


Y,ML)B,k = k^{L) 


and 

m 

i=i 

which directly follow from the definition of the 
incidence matrix B. Thus, we get 




{kT{L)f 

ki 


and 


<^{L) = 

2 = 1 


kf^(L)kr\L) 

h 


From this we easily derive total connectivity of 
a link-induced subgraph as the sum 


r(L)+a(L) = ^fcr(L) = fci„(L). 
2 = 1 


B Cost-landscape 

of the bow-tie graph 

For the bow-tie graph we expect two link com¬ 
munities, namely the triangle {1,2,3} and its 
complement {4, 5,6}, cf. Figure 2 and Evans and 
Lambiotte (2009). To describe the 2”^ different 
possible subgraphs it is advantageous to make 
use of the spherical topology of any landscape 
of subgraphs. Indeed, the cost-function land¬ 
scape of a graph’s subgraphs can be seen as the 
surface of a globe 

• with the whole and the empty graph at the 
poles, 

• with all possible subgraphs of the same size 
on each circle of latitude, and 

• with complementary subgraphs situated at 
antipodes. 

The neighbours of a place in the landscape can 
be reached by adding an element to the set of 
nodes (for node-induced subgraphs) or of links 
(for link-induced subgraphs), respectively, or by 
deleting an element from this set. That means, 
there are no direct relations between places on 
the same circle of latitude. Steps (adding or re¬ 
moving nodes or links) are moves between neigh¬ 
bouring circles of latitude. 



Figure 2: Bow-tie graph with numbered links 


We define the north pole as corresponding to 
the empty subgraph and the south pole as cor¬ 
responding to the whole graph. The 'k-globe of 
the bow-tie graph has five circles of latitude cor¬ 
responding to six subgraphs with one link, 15 
with two, 20 with three, 15 with four, and six 
with five links, respectively. 

For the empty graph at the north pole cr = 0 
and 4/ = 1 (by definition). The six single links 
as the smallest real subgraphs are located at the 
highest circle of latitude. The two outer links 1 
and 6 have cr = l- l/2-|-l-l/2 = l and 4/ = 0.6, 
the four inner links have a = lT/2-1-1-3/4 = 5/4 
and = 0.75. 

There are ten connected and five unconnected 
subgraphs with two links: 

• four connected subgraphs with one outer 
link and one inner link (e.g. link set (1, 2}) 
resulting in tr = 1 • 1/2 -|- 1 • 3/4 = 5/4 and 
4- « 0.469, 

• six connected subgraphs with two inner 
links (e.g. link set (2, 3}) and tr = 1 • 1/2 + 
2 • 2/4 -f 1 • 1/2 = 2 and 4- = 0.75, 

• four unconnected subgraphs with one outer 
and one inner link (e.g. link set {1,4}) and 
a = 9/4 and 4^ « 0.844, 

• one unconnected subgraph with two outer 
links ({1, 6}) and a = 2 and 4' = 0.75. 

On the equator of the 4'-globe there are 20 
triples of links which can be classified into four 
types: 

• the triangle {1,2,3} and its complement 
{4, 5, 6} with (T = 2 • 2/4 = 1 and 4* = 1/3, 

• four triples of inner links (e.g. link set 
{2,3,4}) and their unconnected comple¬ 
ments (e.g. link set {1,5,6}) with a = 
3/2 -f 3/4 = 9/4 and 4- = 0.75, 
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• eight subgraphs with one of the two outer 
links, one of the two attached inner links 
and one of the two inner links not attached 
to the outer link (e.g. link set {1,2,4}): 
they all have a = 1-1/2 + 2- 2/4+ 1-1/2 = 2 
and 4' = 2/3, 

• the unconnected triple with one outer and 
two inner links (set {1,4,5}) and its un¬ 
connected complement (set {2,3,6}) with 
(7 = 3 and 4' = 1. 

On the two circles of latitude on the south¬ 
ern hemisphere we find the complements of the 
subgraphs on the northern hemisphere with the 
same dt-values. Next to the equator we find 13 
connected and two unconnected subgraphs with 
four links each: 

• two unconnected quadruples with one trian¬ 
gle and the second outer link (e.g. link set 
{1, 2,3,6}) which have ct = 2 and 4* = 0.75, 

• the central star with all four inner links 
{2, 3,4,5} with cr = 4/2 = 2 and 4' = 0.75, 

• four subgraphs containing one of the two 
triangles plus one of the two inner links (e.g. 
link set {1,2, 3,5}) which all have ct = 1- 
3/4 + 1 • 1/2 = 5/4 and 4- « 0.469, 

• the four subgraphs with both outer links 
and two inner links connecting them (e.g. 
link set {1,2, 5, 6}) with a = 2/2 + 4/4 = 2 
and 4^ = 0.75, 

• the four subgraphs with one outer link and 
three inner links (one of them attached to 
the outer link, e.g. link set {1, 2,4, 5}) with 
cr = 3/2 + 3/4 = 9/4 and 4- « 0.844. 

All complements of the six single links contain¬ 
ing the five other links are connected and have 
the same 4'-values as their single-link comple¬ 
ments (cf. above). The full graph at the south 
pole with CT = 0 and 4* = 1 is connected. The 
4'-landscape of links has two local minima: the 
two triangles have a locally and globally mini¬ 
mal 4* = 1/3. There are no other local minima. 
Thus, we obtain the pair of complementary tri¬ 
angles as the only solution. 
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